Performance of a Multicore Matrix Multiplication Library

نویسنده

Frank Lauginiger

چکیده

Multicore processors promise dramatic improvements in performance, but their diverse and often unique architectures are a major inhibitor to software adoption. Algorithm libraries that operate at the chip level and are optimized across multiple cores provide the quickest route by which programmers can port or develop highperformance software for multicores. This paper reports on a flexible matrix multiplication library for the Cell Broadband EngineTM (BE) processor that meets or exceeds the performance of known matrix multiplication implementations on the Cell. In addition, the library operates within a larger framework for programming multicores that enables programmers to combine library code with multicore functions they have developed themselves.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PERI - Auto-tuning memory-intensive kernels for multicore

Abstract. We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to sparse matrix vector multiplication (SpMV), the explicit heat equation PDE on...

متن کامل

Subdivision Surface Evaluation as Sparse Matrix-Vector Multiplication

We present an interpretation of subdivision surface evaluation in the language of linear algebra. Specifically, the vector of surface points can be computed by left-multiplying the vector of control points by a sparse subdivision matrix. This “matrix-driven” interpretation applies to any level of subdivision, holds for many common subdivision schemes (including Catmull-Clark and Loop), supports...

متن کامل

Hybrid Algorithms for Matrix Multiplication on Multicore Clusters

Hybrid programming (through messages and shared memory) has gained importance since the appearance of multicore cluster architectures, fruit of the technological advance of processors and the physical limitations imposed by traditional architectures. This new programming paradigm allows exploiting the new memory hierarchy offered by the architecture. The purpose of this work is to carry out a c...

متن کامل

Fast recursive matrix multiplication for multi-core architectures

In this article, we present a fast algorithm for matrix multiplication optimized for recent multicore architectures. The implementation exploits different methodologies from parallel programming, like recursive decomposition, efficient low-level implementations of basic blocks, software prefetching, and task scheduling resulting in a multilevel algorithm with adaptive features. Measurements on ...

متن کامل

Effective Implementation of DGEMM on Modern Multicore CPU

In this paper we will present a detailed study on tuning double-precision matrix-matrix multiplication (DGEMM) on the Intel Xeon E5-2680 CPU. We selected an optimal algorithm from the instruction set perspective as well software tools optimized for Intel Advance Vector Extensions (AVX). Our optimizations included the use of vector memory operations, and AVX instructions. Our proposed algorithm ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Performance of a Multicore Matrix Multiplication Library

نویسنده

چکیده

منابع مشابه

PERI - Auto-tuning memory-intensive kernels for multicore

Subdivision Surface Evaluation as Sparse Matrix-Vector Multiplication

Hybrid Algorithms for Matrix Multiplication on Multicore Clusters

Fast recursive matrix multiplication for multi-core architectures

Effective Implementation of DGEMM on Modern Multicore CPU

عنوان ژورنال:

اشتراک گذاری